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ABSTBACT 

Through Monte Carlo procedures, three different 
techniques for estimating the parameter theta <proportion of the 
"chocks** remaining in the system) in the Integrated Moving Average 
i^t^j^) time' series model are compared in terms of (1) the accuracy 
of the estimates, (2) the independence of the estimates from the true 
value of theta, and (3) the independence of the estimates from a 
'shift in level' in the time-series following an intervention, in the 
^usunl" range for theta, the methods appear equally accurate. One 
produces complex estimates in special cases. Estimates are 
independent of the true value and changes in level, (Author) 
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1, Background 

. The Integrated Moving Average (IMA) models for analysis of time 
series data have been increasingly useful in the behavioral sciences, 
including educational research* Specifically, these ntodels are well- 
suited for testing hypotheses arising from interventions in either 
experimental or non-experimental situations; the researcher can 
compare a variable's pattern of behavior before the intervention 
has occurred with its behavior afterwar<?s, and can do so without having 
to meet common assumptions of stochastic independence of observations 
(see Glass, Willson, and Gottman, 1975 for methods and examples*) 

Of these models, the model IM (0,1,1) is frequently identified 
as a good descriptor of sample time seiries aata* This model has the 
form 

(1.1) z^. - z^^^j^ = at - ©at-l 
where 2^ = observation or datum recorded at time period i, a^ = random 
"shock" at time i, and 0(theta) = a fixed constant. It postulates 
(in words) tSiat the difference between two consecutive observations 
is due to a random shock at the time of the current observation, minus 
(or plus, depending on the sign of 6) some fixed proportion (9) of 
shock "left over" froin the preceding observation. 

The single parameter 9 measures "carryover" of the influence of 
the random shocks; for reasons of mathematical stability, 9 must be in 
the interval (-1,+1), and so may indeed be thought of as a proportion. 

IMi (0,1,1) can be rearranged in various ways to incorporate 
parameters measuring patterns in the data, or changes in patterns 
coincident with interventions; such parameters may be used to measure 



Series levels change In level after Intervention, series drifts or 
change In serle5; drift after Intervention, 

For example^ appropriate rearrangement of .(1*1) yields 

(1.2) zt = ^ + (^^^5 1=1 ^1 + ^t * 

which expresses z as a sum (hence » Integrated moving average) of 
previous and current random shocks; the parameter L has been added to 
Indicated the "level" of the series previous to observation 1# A 
value of 1, may be estlirated from the data» given a suitable value of 9; 
more typically^ however, It is a change in series level that is of interest* 
By postulating (1#2) before a treatment event (or Interv^tlon) E occurs » 
and by postulating 

(1.3) 2t L + ^ (1-9) a. + at 

after E, one may estimate not only L> but estimate S (change in series 

level at E) as well. Once again, this estimation requires a suitably 

accurate value of 9, 

Other models Toay be derived^ and parameters defined as needed, A 

transformation of the raw data and utilization of the general linear model 

permits least*-squares estimates of these parameters of interest, along with 

appropriate tests of hypotheses using nothing more esoteric than Student's 

t-dlstrlbution (61ass» Willson^ and Gottman^ 1975» pp. 136 ff#); all such 

procedures, however » necessarily depend on the specific value of 9 used. 

Since 9 is itself generally unknown, some procodui«> must be used for 

finding the '^appropriate" value. 

Three such methods for "choosing'^ 9 have been suggested. The first 

& 2 

of these selects the value of 9 TJhich minimizes ^ a^ in the general 
linear model y = Xb + a; here» y is a column vector of transformed data 
defined by y^ = z^ and y^ = z^, - ^t-l"*" ^t-1^^^ X is the K x 2 "design" 

matrix whose (i,l)th entry is 9^"^ » and whose (l»2)th^ entry is 0 if 1 ^ nj^ 
and fj*^"^'! - t if l\n, (here n, = number of time points preceding the intervention 



E, and N = total number of time points in ths series); b is the vector[^3 * 

n - 

and a*^3 a column vector of random shocks (errors) a«« The quantity ^ 

t 1^1 » 

id easily confuted as (y -^) (y *^)C>)« This method yields the maximum 
likelihood estimate of theta. In what follows, we shall refer to this 
method as SSE or SSEHIN, for ^um of Squared Errors > MINim^ed,^ 

The second method is a Bayesian approach: we use the confuted value 
of « (y - Xb)'''(y - Xb)/(N - 2) to define the function h(9ls) = Ix^lf^a"^"^' , 
and choose 9 such that h is maximized* This method assumes an "uninformed'^ prior 
distribution. Box and Tiao (1965, p. 189) give an explicit formula for h for 
^the case of modds (1*2) and (1«3), Hereafter we shall refer to this procedure 
as PD or PDMAX, for ^^osterior Distribution MAX lmlaation*^ 

The third method merely solves for 9 in the theoretical identity 

(1.4) = -9 / (1 + 9^) 

(Box and Jenkins^ 1970, p. 69)> where ?j^is the lag-1 autocorrelation (which 
can easily be estimated from the data). We refer to this method as C0RR, 
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2* Objectives 

No decision rule exists for ''selecting" the "appropriate" value 
of theta^ In fact, no procedures are available for determining 
whether one method should be preferable to the others « Although the 
values of theta produced by the three methods are frequently in close 
agreement, there are instances in which they may differ widely* Three 
examples will illustrate the potential dif ficulties* 

Figures 1,2, and 3 represent time series generated from random 
numbers ^ and preassigned parameter values* In each case, an 
IM^ (0,1,1) model equivalent to (1*2) and (1^3) was used to generate 
the series, with n^ = 30, N = 60, L ^ 0,A^ 0, and 9 = AO. The 
error terms were HID (0,1)* The results are summarized below: 

SERIES SSEMIN 9 PDMAX 9 OORR 9 TRUE 9 

1 *77 -56 *25 AO 

2 .99 *99 AS AO 

3 . 99 -31 undefined AO 

Series 1 is distinguished by complete disagreem^t between the three 
methods, with differences on the order of *2* In Series 2, SSEMIN and 
PDMAX have "topped out," producing estimates at or near the upper limit 
of permissible values of 9; note, however, that CORK has produced a 
good estimate of 9* Series 3 displays yet another "pathological" 
situation: SSEMIN has topped out, H)HUC appears normal, and CORK has 
produced a complex estimate of 91 (The latter circumstance occurs 
whenever |^^|^*5) It should be noted here that these examples were not 
contrived; they appeared in the first 100 time series generated during 
the testing of the computer programs used in this study* 
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Figure 1 A lime Series Defined by 3^ - - - ^ Asl^ - > for which 
— — - t t-i t t*-i 

SSEMIM § = .77, PDMAX t = .56, and CORK § = .25. (Raw data values are 
given below.) 
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Figure 2 A Time Series Defined by Bj. - Bj.,]^ = aj. - '^^t-l * "*^°^ lAich 
SSEMIN ^ - .99, PDMAX = .99, and CORE % = .45. (Raw data values are 
given below.) 
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Figure 3 A Time Series Defined by a - a , « .4a^ , » for which 

— ^ t t-1 t t-1 

SSEMIK ^ = .99> FDMAX ^ = .31 > and C06R ^ id undefined. (Ravr data values 
are given below.) 
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* Thue, we ask the foliating questions: 

(1) How accurately do the three methods estimate theta? 

(2) To what extent does each method's accuracy depend on the true 
value of theta? 

(3) To vhat extent does the value of another parameter in the model 
(namely, a change in series level: j ) influence the accuracy of 
each method? 

* 

3. Method 

"Monte Carlo** simulation techniques were deemed appropriate, and 
were utilized on the University of Minnesota's Control Data Cyber 74 
computer* 

Twenty populations of time series of the form showu in Cl»2) and 
(1#3) were defined; ten for which theta vas given a value of .99, .9, 
.7, .5, .3, .1, 0, -.3, -.5, and -.9fe, respectively, and delta was zero, 
and ten more with the same values o£ theta, and delta = .5. (More 
positive values than negative were used for theta because theta is 
nearly always positive in the real world*) For each of these 20 populations, 
1000 sample series were generated; each of these series had n^^ 30, 
N = 60, L = 0, and used random shocks that were normal. Independent, 
with mean 0 and variance 1, For each of the 20,000 sample series thus 
defined, theta was estimated from the data by the methods SSEMIN, PDMAX, 
and CORR; these numbers, plus the lag •* 1 autocorrelation (r.ef erred to 
hereafter as LAG) were retained, and descriptive statistics computed* 

For each preassigned value of theta, a Smirnov two-satnple goodness-* 
of"fit test was performed, comparing the distributions for which ^ = 0 

with those for which J- (Conover, 1971, pp. 309-314) 
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4. Results 
* 

* Descriptive statistics produced by the 20 conq>uter runs are 
displayed In Tables l-5# 

Table 1 shows that SSEMIH and PDMAX are comparably accurate over 
all values of 8 tested; the nieans are within «025 of the true values of 
0, except near the extremes, vhere differences of «09 or so can occur* 
The medians of SS£MIN and PDMAX are similarly accurate, and are generally 
better estimates near theta^s extreme values* The modes reflect the 
topplng*^out or bottoming-out effect notr<! previously* 

Table 2 shows all three methods to be o^ surprisingly consistent 
accuracy, In the sense that the distributions of § all have standard 
errors on the order of ,01, Independent of either 0 or^, 

Tablf^ 3 reveals (as one might expect) that as the true value of 8 
deviates from 0 (the midpoint of Its possible range of values) the 
distribution of estimates of 9 provided by SSEMIN and pDMAX become less^ 
and less symmotrlc* 

The evidence for CORR l^s somewhat less encouraging; although It is 
substantially easier to compute In practice than either SSEMIN or 
PDMAX, we see from Tables 1*^3 that the behavior of Its estimates is 
much less desirable than that of the other methods* Its mean ^ appears 
to be tolerably accurate only in the range 0 to ,6 or so (albeit the most 
common real-llfft range for 9)*, though less so than the other methods* 
It is both '^quicker" and "dirtier" than its companions* 

CORR does not show ^ tendency toward skewness at extreme values of 
true theta; this lack of "sensitivity", as well as part of the method^s 
general Inaccuracy, can be attributed to the fact that a large portion 
of the distributions tested had lag •* 1 autocorrelations (LAG her'^ that 



Table Ij Measures of Central Tendency Computed for Various Chosen Valuec of 
^ * Theta and Delta; Tabled Values are Estimates of Theta^ Based on 
1000 Computer-Generated Time Series, 
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TiaHLe 2t Iteasi'xcs of Variability Computed for Various Chosen Values of 
Theta and DeJta; Tstbled Values Refer to Estimates of Theta, 
Based on lOQO Compaiter-Gcnerated Time Series. 
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Table 3j SI:eu and Ilurtosis Goriputod for Various Chosen Values of Theta 
aiid Delta; Tabled Values Refer to Estimates of Theta, Based 
on 1000 Conputer-Generated Tine Scries • 
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5.751 


8.?i^3 


.361 




.0 






-.73^^ 


.31^3 




8.W7 


10.133 ■ 




.3 


.5 






-1,0';2 


.1^15 




8.022 


10.172 


-,070 


.5 


.0 






-1.955 






12.837 


17,527 


-.195 


.5 


.5 




-2.669 


-1.970 


.21^2 




12,972 


15.1*^9 


-.193 


.7 


.0 




-3.9?'^ 




.166 




17.795 


35.009 


-.2t^7 


.7 


.5 




_i^.07^ 


-fj,.009 


.039 




20,560 


3l.?o^^ 


-.590 


.9 


.0 






-S.322 


.056 




2'^.3'^2 


9?.oit8 


-.^^32 


.9 


.5 




-6.127 


-8.276 


.015 




28.(^36 


89,053 


-.29s . 


.99 


.0 






-11.311 


.097 






1^!J^.187 


-,i^5 


.99 


.5 




-5.73r: 


-9.072 


-.135 




32.093 


87.367 


-,272 
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fell out of range (see Table 5). Without this truncation, the XAG 
estimates provided good estimates of the true lag - 1 autocorrelation 
(which can then be transformed to theta via (1*4) )• Summary statistics 
of these distributions of nontruncated IA6 estimates appear in Table 4, 

(Table 5 also displays percentages of the sanples tested for vhich 
SSEMIN and/or PDMAX topped- or bottomed -out. This gives us a rough 
idea of the expected frequency of these situations*) 

Finally, ve note from Table 6 that ^most of the distributions 
generated by SSEMIN, FDMAX, and lAG shoved a theoretical dependence on the 
value o£S , whereas those distributions generated by COSR showed little 
dependence onS, The test statistic being evaluated is the longest 
vertical distance between the cumulative density functions of the two 
sample distributions under scrutiny (Conover, 1971, p. 310), 

5, Conclusions 

SSEMIN and FDMAX appear to estimate theta adequately in all ranges 
of true theta* CORR is less accurate, especially outside the range ,0 
to ,6,, although the lag - 1 autocorrelations (LAG) of samples are good 
estimators of the true autocorrelation ^j^. Practical problems in using 
each method include the very real possibility that an estimator will 
"top out" or bottom out", or, in the case of CORR, not exist. 



15 



•14- 



Table, Summary Statistics Conputed for Various Chosen Values of Theta 
and Itelta; Tabled Values Refer to Sstimates of the Lag-1 
Autocoarrclation, Based on ]000 Conpoiter-Gonerated Time Series. 



TRU3 
THSTACe)/ 

THUS LAG-1 THUE 
CORR2i:.ATlO;r(B) D2LTAC& 




VARIABILm 

STO. STD. 

ERROR VjIEIAJ[CE 


HiGiGsn iioiurrs 
sm i:uiiTosis 


-.99 / '^99 


.0 






MB 


.510 




.004 


.136 


.018 


-.336 


.246 


-.99 / *^99 


.5 






.H-57 


.370 




.004 


.137 


.019 


-.328 


-.134 


-.5 / MO 


.0 




,3^2 




.360 




0OO5 




.023 


-.294 


-.006 


-,5 / .'^OO 


.5 




.351 


.360 


.430 




.005 


.151 


.023 


-.378 


.011 


-.3 / .275 


.0 




.216 


.221 


.190 




.005 


.165 


.027 


-.182 


-.000 


-.3 / .275 


.5 




.207 


,211; 


.260 




.005 


.171 


.029 


-.230 


-.258 


.0 / .0 


.0 




-0O3O 


-.033 


-.040 




.006 


.179 


0O32 


,069 


-.130 


.0 / .0 


.5 




-.011.3 


-.044 


-.080 




0OO6 


.190 


.036 


.073 


-.010 


.1 /-.099 


.0 




-.132 


-0I35 


-o210 




,006 


.179 


.032 


.169 


-.319 


.1 /-.099 


.5 




-.120 


-.123 


-.170 




.006 


.182 


.033 


.234 


-.016 


.3 /-.275 


.0 




-.279 


-.292 


-.3'^0 




.005 


.162 


.026 


.274 


.117 


.3 ./-•275 


.5 




-.280 


-.291 


-.250 




,005 


.170 


.029 


.318 


-.043 


.5 /-Mo 


.0 




-.399 




-.390 




.005 


.146 


.021 


,267 


-.007 


.5 /-MO 


.5 




-.392 


-.400' 


-.410 




.005 


.145 


.021 


.347 


.002 


.7 /-A70 


.0 




'Ml 


-.466 


-.550 




.004 


.136 


.018 


.314 


.030 


.7 /-.^^70 


.5 




-.1^55 


-.461 


-.450 




.004 


.134 


.018 


.301 


-.211 


.9 /-.^f'97 


.0 




~M0 


-.4B4 


-.480 




.004 


.131 


.017 


.260 


.175 


.9 /'.^9? 


.5 




-.im 


-.W 


-.450 




,004 


.130 


.017 


,410 


.266 


.99 /-M9 


.0 






-.490 


-.560 




.004 


.332 


.017 


.307 


-.130 


.99 /-M9 


o5 




-Ml 


-.500 


-.520 




.004 


.127 


.016 


.495 


.491 
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Table 5i Percentage of 1000 Gonputer-Generated Tine Scries Judged 

. ' "Out of Eangc." For SSS and PD, BOP = .^distributions idth 
^~ -.99, and TOP = :3 Distributions withG^ o99 ; for LAG, 
30r = J5 Distributions with Pj ^ -.5, and TOP = Distributions 
tdth P, 2 .5 

THUS 

THSTACe)/ SSE PD LAG 



TRU3 UC-1 T?W3 ^ 

GOR!CTA?I0:r('?^'> DELTAf^y 3CT riD TOP BOT MP ' TOP BOT IZD T O? 



-.99 / .^99 .0 
-.99 / .^99 .5 




85.7 12,6 1.7 

32. s 65.9 1.3 




^9.6 50.2 0.2 
.1.9 80.0 0.1 




0.0 68.3 31.7 

0,0 Oc.%{ J/ ij 


-.5 / .^0 .0 
-.5 / .J^O .5 




9.7 87.1 3.2 
9.1 88.0 2.9 




7.5 91.5 1.0 




0.0 8^^.0 l6.C 
OjO 8f{..2 15. c 


-.3 / .275 .0 
-o3 / .275 .5 




5.6 92.1 2.3 
5.9 91.7 2.4 




3.3 95.0 1.1 
3.9 95.3 0.8 




0.0 96.3 3.; 

0.0 97.3 2.1 


.0 / .0 .0 
.0 / .0 .5 




3.7 93.0 3.3 
2.7 9*^»k 2.9 




1.5 97.^ i.l 
1.0 97,9 1.1 




0.2 99.5 0.^ 

0.^* 99.3 0,' 


.1 /-.099 .0 
.1 /-.099 .5 




3.6 92<.J^ h»0 
3.1 92.9 ^.0 




1.2 96.2 2.6 
0.9 97.4- 1»7 




1.3 93.7 o.( 

l<i3 9^.7 0.( 


.3 /-.275 .0 
o3 /-.275 »5 




2.5 92.1 5-^ 
2,7 91.0 6.3 




0»6 95.2 3'5 
1.1 94.9 4.0 




8,5 91.5 0.( 
9.9 90.1 0. 


,5 /-.400 .0 
.5 /-.400 .5 




2.8 89.0 8.2 
2.3 88.6 9.1 




0.4 94.3 5.3 
0.8 92.7 6.5 




26.9 73.1 0. 
26.0 74.0 0. 


.7 /-.470 .0 
.7 /-.470 .5 




3.1 79.7 17.2 
2.4 80.1 17.5 




0.8 86.6 12.6 
0.6 86.1 13.3 




42.5 57.5 0. 
41.0 59.0 0. 


.9 /-.497 .0 
.9 /-.497 .5 




3.2 53.5 43.3 
2.0 61.8 36.2 




0.6 71.6 27.8 
0.6 75.5 23.9 




47.0 53.0- 0. 
47.0 53.0 0. 


.99 /-.499 .0 
.99 /-.499 .5 




3.0 10.6 86.4 
2.6 64.0 33.4 




0.4 48.7 50.9 
0.8 87.1 12.1 




48.6 51.4 0. 
51.9 48.1 0. 
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Table 6; Smirnov Two-Sample Test Statistics, Con^iaring 9 Distributions 

with<5= 0 to those with = ,5. * « Significant at alpha = ,05, 
** »« significant at alpha = ,01; all tests are 2<-tailed, 



TRUE 



THETA (9) 


SSEMIN 




OORR 


LAG 


-.99 


.895** 


.536** 


.057 


.100** 


-.5 


.098** 


.080** 


.054 


.068* 


-.3 


.064* 


.067* 


.046 


.050 


.0 


.072* 


.078**" 


.053 


.057 


.1 


.103** 


.105** 


.071 


.071* 


.3 


.065* 


.068* 


.048 


.056 


.5 


.091** 


.076** 


.049 


.051 


./ 


.175** 


.133** 


.051 


.062* 


.9 


.433** 


.278** 


.042 


.043 


.99 


.864** 


.525** 


.095* 


.075** 
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Each estimation method Is consistently accurate, In the sense that 
If the specific estimate 9 Is thought of as a san^ls chosen from ^ 
theoretical distribution of 9, then the standard error of the estimate 
Is likely to be less than *01* 

Although the presence of a change In level has little practical 



reveals {table 6) that the value of 6 does change the nature of the 
theoretical distribution of estimates of theta* 



impact on the estimated value 




other Investigation 
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